An Empirical Study of Arabic Formulaic Sequence Extraction Methods

نویسندگان

  • Ayman Alghamdi
  • Eric Atwell
  • Claire Brierley
چکیده

This paper aims to implement what is referred to as the collocation of the Arabic keywords approach for extracting formulaic sequences (FSs) in the form of high frequency but semantically regular formulas that are not restricted to any syntactic construction or semantic domain. The study applies several distributional semantic models in order to automatically extract relevant FSs related to Arabic keywords. The data sets used in this experiment are rendered from a new developed corpus-based Arabic wordlist consisting of 5,189 lexical items which represent a variety of modern standard Arabic (MSA) genres and regions, the new wordlist being based on an overlapping frequency based on a comprehensive comparison of four large Arabic corpora with a total size of over 8 billion running words. Empirical n-best precision evaluation methods are used to determine the best association measures (AMs) for extracting high frequency and meaningful FSs. The gold standard reference FSs list was developed in previous studies and manually evaluated against well-established quantitative and qualitative criteria. The results demonstrate that the MI.log_f AM achieved the highest results in extracting significant FSs from the large MSA corpus, while the T-score association measure achieved the worst results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A methodology for the extraction of information about the usage of formulaic expressions in scientific texts

In this paper, we present a methodology for the extraction of formulaic expressions, which goes beyond the mere extraction of candidate patterns. Using a pipeline we are able to extract information about the usage of formulaic expressions automatically from text corpora. According to Biber and Barbieri (2007) formulaic expressions are “important building blocks of discourse in spoken and writte...

متن کامل

Presenting an Empirical Correlation for Maximum Sauter Mean Diameter in a Spray Extraction Column

Based on the importance of drops' behavior in liquid-liquid extraction, the maximum sauter mean drop diameter has been investigated and correlated in a counter-current spray extraction column with two chemical systems. Spargers were set of nozzles in all experiments. Studying the effects of several parameters on drops size, some correlations were estimated by the last available version of softw...

متن کامل

Developing EFL Learners' Oral Proficiency through Animation-based Instruction of English Formulaic Sequences

The current pretest-posttest quasi-experimental study attempts, firstly, to probe the effects of teaching formulaic sequences (FSs) on the second or foreign language (L2)  learners' oral proficiency improvement and secondly, to examine whether teaching FSs through different resources (i.e. animation vs. text-based readings) have any differentially influential effects in augmenting L2  l...

متن کامل

Formulaic Language in Alzheimer's Disease.

BACKGROUND Studies of productive language in Alzheimer's disease (AD) have focused on formal testing of syntax and semantics but have directed less attention to naturalistic discourse and formulaic language. Clinical observations suggest that individuals with AD retain the ability to produce formulaic language long after other cognitive abilities have deteriorated. AIMS This study quantifies ...

متن کامل

The Comparison of different Procedures for DNA extraction from paraffin-embedded Tissues: A commercial kit and a traditional method based on heating

Abstract Background and objectives: Paraffin-embedded tissues and clinical samples are a valuable resource for molecular genetic studies, but the extraction of high-quality genomic DNA from this tissues is still a problematic issue. In the Present study, the efficiency of two DNA extraction protocols, a commercial kit and a traditional method based on heating and K Proteinase was compared. Mate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016